# First we make sure that devtools is installed on your R envirnoment
if (!require(devtools)) {
install.packages('devtools')
}
# Then we can install my script by running the following command
devtools::install_github('Moustapha-A/autopolate')
Downloading GitHub repo Moustapha-A/autopolate@master
from URL https://api.github.com/repos/Moustapha-A/autopolate/zipball/master
Installing autopolate
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL \
'/tmp/RtmpLnb0JF/devtools35a430d73f9a/Moustapha-A-autopolate-87961a8' \
--library='/home/qpc/R/i686-pc-linux-gnu-library/3.3' --install-tests
* installing *source* package ‘autopolate’ ...
** R
** data
*** moving datasets to lazyload DB
** preparing package for lazy loading
* Adding fda to Imports
Next:
Refer to functions with fda::fun()
* Adding data.table to Imports
Next:
Refer to functions with data.table::fun()
* Adding plotly to Suggests
Next:
Use requireNamespace("plotly", quietly = TRUE) to test if package is installed,
then use plotly::fun() to refer to functions.
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (autopolate)
Reloading installed autopolate
First we intall the package from github using devtools::install_github
library(autopolate)
Then we load the library as any other library
data("EZ68")
EZ68 = na.omit(EZ68)
Now we will load some data that belongs to one of polluscope’s campaigns. Using the Ecomzen 68 sensor.
head(EZ68,10)
The first 10 records of EZ68
plot(as.POSIXct(EZ68$`Date (UTC)`,format="%Y-%m-%d %H:%M:%S"),EZ68$NO2, main = "Ecomzen 68 NO2 TS", xlab = "Time", ylab = "NO2")
The plot of NO2 against Time
We can see that the aquisition rate of Ecomzen is not stable sometimes it is ~1 min other is ~2 min. What we want to do is to have the values at a stable rate. Let us say that we need to have the data as a rate of 30 seconds. We will do that using the autopolate function I developed:
NO2_ts= autopolate(dataframe = EZ68,
timeCol = "Date (UTC)",
timeFrmt = "%Y-%m-%d %H:%M:%S",
valueCol = "NO2",
targetRate = 30)
head(NO2_ts,10)
plot(NO2_ts$`Date (UTC)`,NO2_ts$NO2, main = "Treated Ecomzen 68 NO2 TS", xlab = "Time", ylab = "NO2")
However it is alwayse preferable to check the quality of the data interpolation before using and considering it. That is why I added some metrics to be included in the interpolation process. Those metrics are The Root Mean Squared Error (RMSE), the plot of the initial data against the interpolated curves, and the plot of the residuals. Those can be used by adding plot=TRUE , residuals=TRUE and RMSE=TRUE.
NO2_ts= autopolate(dataframe = EZ68,
timeCol = "Date (UTC)",
timeFrmt = "%Y-%m-%d %H:%M:%S",
valueCol = "NO2",
targetRate = 30,
plot=TRUE,
residual = TRUE,
RMSE = TRUE)
[1] "##################### RMSE #####################"
[1] "RMSE of function 1 : 0.53878735145198"
[1] "Average RMSE : 0.53878735145198"
NO2_ts= autopolate(dataframe = EZ68,
timeCol = "Date (UTC)",
timeFrmt = "%Y-%m-%d %H:%M:%S",
valueCol = "NO2",
targetRate = 30,
plot=TRUE,
residual = TRUE,
interactive = TRUE,
RMSE = TRUE)
[1] "##################### RMSE #####################"
[1] "RMSE of function 1 : 0.53878735145198"
[1] "Average RMSE : 0.53878735145198"
The data scientist can alwayse add the interactive=TRUE argument to have interactive plots. However this requires the plotly package to be installed.
NO2_ts= autopolate(dataframe = EZ68,
timeCol = "Date (UTC)",
timeFrmt = "%Y-%m-%d %H:%M:%S",
valueCol = "NO2",
targetRate = 30,
basisRatio = 0.5,
plot=TRUE,
residual = TRUE,
interactive = TRUE,
RMSE = TRUE)
[1] "##################### RMSE #####################"
[1] "RMSE of function 1 : 0.28318848383448"
[1] "Average RMSE : 0.28318848383448"
To achieve more fitting to data the user can use the basisRatio argument to increase the number of basis functions. If it is not specified it is default is 0.1 . Which means 0.1 of the initial data in our case we have 441 data records so the default number of basis function is 0.1*441 = 45 basis functions. In the above example we used a basisRatio=0.5 i.e. 0.5*441 = 221 basis function. we can see the difference in the average RMSE beteen interpolating using 45 and 221 basis functions.